{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "adf7d63d",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/managed/vectaraDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db0855d0",
   "metadata": {},
   "source": [
    "# Vectara Managed Index\n",
    "In this notebook we are going to show how to use [Vectara](https://vectara.com) with LlamaIndex.\n",
    "Vectara is the first example of a \"Managed\" Index, a new type of index in Llama-index which is managed via an API."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfe2497c",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74158792",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-indices-managed-vectara"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6019e01a",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "359797ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.indices.managed.vectara import VectaraIndex\n",
    "from llama_index.core import SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "Load the documents stored in the `Uber 10q` using the SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c154dd4b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "documents loaded into 3 document objects\n",
      "Document ID of first doc is ec2e38f2-c333-4f46-b2a1-0b3f08b9be6e\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "documents = SimpleDirectoryReader(os.path.abspath(\"../data/10q/\")).load_data()\n",
    "print(f\"documents loaded into {len(documents)} document objects\")\n",
    "print(f\"Document ID of first doc is {documents[0].doc_id}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0232fd1",
   "metadata": {},
   "source": [
    "### Add the content of the documents into a pre-created Vectara corpus\n",
    "Here we assume an empty corpus is created and the details are available as environment variables:\n",
    "* VECTARA_CORPUS_ID\n",
    "* VECTARA_CUSTOMER_ID\n",
    "* VECTARA_API_KEY"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8dfdb0a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = VectaraIndex.from_documents(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "### Query the Vectara Index\n",
    "We can now ask questions using the VectaraIndex retriever."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb174ec3",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"Is Uber still losing money or have they achieved profitability?\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52878fd2",
   "metadata": {},
   "source": [
    "First we use the retriever to list the returned documents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Most jurisdictions in which we operate have laws that govern payment and financial services activities. Regulators in certain jurisdictions may determine that\n",
      "certain aspects of our business are subject to these laws and could require us to obtain licenses to continue to operate in such jurisdictions. For example, our\n",
      "subsidiary in the Netherlands, Uber Payments B.V., is registered and authorized by its competent authority, De Nederlandsche Bank, as an electronic money\n",
      "institution. This authorization permits Uber Payments B.V. to provide payment services (including acquiring and executing payment transactions and money\n",
      "remittances, as referred to in the Revised Payment Services Directive (2015/2366/EU)) and to issue electronic money in the Netherlands. In addition, Uber\n",
      "Payments B.V. has notified De Nederlandsche Bank that it will provide such services on a cross-border passport basis into other countries within the EEA.\n",
      "--\n",
      "Most jurisdictions in which we operate have laws that govern payment and financial services activities. Regulators in certain jurisdictions may determine that\n",
      "certain aspects of our business are subject to these laws and could require us to obtain licenses to continue to operate in such jurisdictions. For example, our\n",
      "subsidiary in the Netherlands, Uber Payments B.V., is registered and authorized by its competent authority, De Nederlandsche Bank, as an electronic money\n",
      "institution. This authorization permits Uber Payments B.V. to provide payment services (including acquiring and executing payment transactions and money\n",
      "remittances, as referred to in the Revised Payment Services Directive (2015/2366/EU)) and to issue electronic money in the Netherlands. In addition, Uber\n",
      "Payments B.V. has notified De Nederlandsche Bank that it will provide such services on a cross-border passport basis into other countries within the EEA.\n",
      "--\n",
      "The court granted our motion to defer the summary judgment motion on January 12, 2022. Our chances of success on\n",
      "the merits are still uncertain and any reasonably possible loss or range of loss cannot be estimated. Swiss Social Security Reclassification\n",
      "Several Swiss administrative bodies have issued decisions in which they classify Drivers as employees of Uber Switzerland, Rasier Operations B.V. or of Uber\n",
      "B.V. for social security or labor purposes. We are challenging each of them before the Social Security and Administrative Tribunals. In April 2021, a ruling was made that Uber Switzerland could not be held liable for social security contributions.\n",
      "--\n",
      "The court granted our motion to defer the summary judgment motion on January 12, 2022. Our chances of success on\n",
      "the merits are still uncertain and any reasonably possible loss or range of loss cannot be estimated. Swiss Social Security Reclassification\n",
      "Several Swiss administrative bodies have issued decisions in which they classify Drivers as employees of Uber Switzerland, Rasier Operations B.V. or of Uber\n",
      "B.V. for social security or regulatory purposes. We are challenging each of them before the Social Security and Administrative Tribunals. In April 2021, a ruling\n",
      "was made that Uber Switzerland could not be held liable for social security contributions.\n",
      "--\n",
      "The court granted our motion to defer the summary judgment motion on January 12, 2022 and summary judgment\n",
      "papers will be fully briefed by May 31, 2023. Our chances of success on the merits are still uncertain and any reasonably possible loss or range of loss cannot be\n",
      "estimated. Swiss Social Security Rulings\n",
      "Several Swiss administrative bodies have issued decisions in which they classify Drivers as employees of Uber Switzerland, Rasier Operations B.V. or of Uber\n",
      "B.V. for social security or labor purposes. We are challenging each of them before the Social Security and Administrative Tribunals. In April 2021, a ruling was made that Uber Switzerland could not be held liable for social security contributions.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine(similarity_top_k=5)\n",
    "response = query_engine.retrieve(query)\n",
    "texts = [t.node.text for t in response]\n",
    "print(\"\\n--\\n\".join(texts))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fbf0c93",
   "metadata": {},
   "source": [
    "with the as_query_engine(), we can ask questions and get the responses based on Vectara's full RAG pipeline:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "890f7133",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "As of the provided search results, there is no direct information about Uber's current financial state or whether they have achieved profitability. However, the search results mention that Uber is facing regulatory challenges in different jurisdictions [1][3]. These challenges involve the classification of drivers as employees and social security contributions [3]. The outcome of these cases could affect Uber's financial situation. It is important to note that the search results did not provide a clear answer regarding whether Uber is still losing money or if they have achieved profitability.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine(similarity_top_k=5)\n",
    "response = query_engine.query(query)\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc0874ad",
   "metadata": {},
   "source": [
    "Note that the \"response\" object above includes both the summary text but also the source documents used to provide this response (citations)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e49914a",
   "metadata": {},
   "source": [
    "Vectara supports max-marginal-relevance natively in the backend, and this is available as a query mode. \n",
    "Let's see an example of how to use MMR: We will run the same query \"Is Uber still losing money or have they achieved profitability?\" but this time we will use MMR where mmr_diversity_bias=1.0 which maximizes the focus on maximum diversity:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72832e45",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We are challenging each of them before the Social Security and Administrative Tribunals. In April 2021, a ruling was made that Uber Switzerland could not be held liable for social security contributions. The litigations with regards to Uber B.V. and\n",
      "Rasier Operations B.V. are still pending for years 2014 to 2019. In January 2022, the Social Security Tribunal of Zurich reclassified drivers who have used the App\n",
      "in 2014 as dependent workers of Uber B.V. and Rasier Operations B.V. from a social security standpoint, but this ruling has been appealed before the Federal\n",
      "Tribunal and has no impact on our current operations. On June 3, 2022, the Federal Tribunal issued two rulings by which both Drivers and Couriers in the canton of\n",
      "Geneva are classified as employees of Uber B.V. and Uber Switzerland GmbH.\n",
      "--\n",
      "If the requirement is not repealed or modified, our financial condition, operating results, and cash flows may be adversely impacted\n",
      "by this legislation. In August 2022, the Inflation Reduction Act, or the IRA, was enacted, the provisions of which include a minimum tax equal to 15% of the\n",
      "adjusted financial statement income of certain large corporations, as well as a 1% excise tax on certain share buybacks by public corporations that would be\n",
      "imposed on such corporations. Pending further guidance, it is possible that the IRA could increase our future tax liability, which could in turn adversely impact our\n",
      "business and future profitability. We are unable to predict what global or U.S. tax reforms may be proposed or enacted in the future or what effects such future changes would have on our\n",
      "business. Any such changes in tax legislation, regulations, policies or practices in the jurisdictions in which we operate could increase the estimated tax liability that\n",
      "we have expensed to date and paid or accrued on our balance sheet; affect our financial position, future operating results, cash flows, and effective tax rates where\n",
      "we have operations; reduce post-tax returns to our stockholders; and increase the complexity, burden, and cost of tax compliance.\n",
      "--\n",
      "Most jurisdictions in which we operate have laws that govern payment and financial services activities. Regulators in certain jurisdictions may determine that\n",
      "certain aspects of our business are subject to these laws and could require us to obtain licenses to continue to operate in such jurisdictions. For example, our\n",
      "subsidiary in the Netherlands, Uber Payments B.V., is registered and authorized by its competent authority, De Nederlandsche Bank, as an electronic money\n",
      "institution. This authorization permits Uber Payments B.V. to provide payment services (including acquiring and executing payment transactions and money\n",
      "remittances, as referred to in the Revised Payment Services Directive (2015/2366/EU)) and to issue electronic money in the Netherlands. In addition, Uber\n",
      "Payments B.V. has notified De Nederlandsche Bank that it will provide such services on a cross-border passport basis into other countries within the EEA.\n",
      "--\n",
      "Taxing authorities have appealed the orders related to tax issues and plan confirmation but did not\n",
      "appeal the settlement approval. Uber is not a party to those appeals. The taxing authorities’ chances of success on the merits are still uncertain and any reasonably\n",
      "possible loss or range of loss is immaterial. Non-Income Tax Matters\n",
      "We recorded an estimated liability for contingencies related to non-income tax matters and are under audit by various domestic and foreign tax authorities with\n",
      "regard to such matters. The subject matter of these contingent liabilities and non-income tax audits primarily arises from our transactions with Drivers, as well as\n",
      "the tax treatment of certain employee benefits and related employment taxes.\n",
      "--\n",
      "Revenue growth outpaced Gross Bookings growth primarily due to a $1.3 billion increase in our Freight\n",
      "business, primarily due to the acquisition of Transplace during the fourth quarter of 2021, and the net favorable impact to Mobility revenue of $1.1 billion as a\n",
      "result of business model changes in the UK. Net loss attributable to Uber Technologies, Inc. was $1.2 billion, which includes the unfavorable impact of a pre-tax unrealized loss on debt and equity\n",
      "securities, net of $550 million primarily related to changes in the fair value of our marketable equity securities, including: a $641 million loss on our Didi\n",
      "investments, partially offset by a $90 million gain on our Aurora investment. Net loss attributable to Uber Technologies, Inc. also includes $482 million of stock-\n",
      "based compensation expense. Adjusted EBITDA was $516 million, up $508 million compared to the same period in 2021. Mobility Adjusted EBITDA profit was $898 million, up $354\n",
      "million compared to the same period in 2021.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=5,\n",
    "    n_sentences_before=2,\n",
    "    n_sentences_after=2,\n",
    "    vectara_query_mode=\"mmr\",\n",
    "    mmr_k=50,\n",
    "    mmr_diversity_bias=1.0,\n",
    ")\n",
    "response = query_engine.retrieve(query)\n",
    "\n",
    "texts = [t.node.text for t in response]\n",
    "print(\"\\n--\\n\".join(texts))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53e76fd1",
   "metadata": {},
   "source": [
    "As you can see, the results in this case are much more diverse, and for example do not contain the same text more than once. The response is also better since the LLM had a more diverse set of facts to ground its response on:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10bf0fb9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the search results, the profitability of Uber is still uncertain. There are ongoing litigations and regulatory challenges in various jurisdictions regarding labor classification, social security contributions, and tax matters [1][3][4][6]. While Uber has reported revenue growth and improved adjusted EBITDA, it also incurred net losses due to factors such as unrealized losses on investments and stock-based compensation expenses [5]. The outcome of these legal and regulatory issues may impact Uber's financial condition and future profitability [1][3]. Therefore, it cannot be definitively stated whether Uber has achieved profitability or is still losing money.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=5,\n",
    "    n_sentences_before=2,\n",
    "    n_sentences_after=2,\n",
    "    summary_enabled=True,\n",
    "    vectara_query_mode=\"mmr\",\n",
    "    mmr_k=50,\n",
    "    mmr_diversity_bias=1.0,\n",
    ")\n",
    "response = query_engine.query(query)\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e863b914",
   "metadata": {},
   "source": [
    "So far we've used Vectara's internal summarization capability, which is the best way for most users.\n",
    "\n",
    "You can still use Llama-Index's standard VectorStore as_query_engine() method, in which case Vectara's summarization won't be used, and you would be using an external LLM (like OpenAI's GPT-4 or similar) and a cutom prompt from LlamaIndex to generate the summart. For this option just set summary_enabled=False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b0a49d1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uber is still losing money and has not achieved profitability.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=5,\n",
    "    summary_enabled=False,\n",
    "    vectara_query_mode=\"mmr\",\n",
    "    mmr_k=50,\n",
    "    mmr_diversity_bias=0.5,\n",
    ")\n",
    "response = query_engine.query(query)\n",
    "print(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
